Coursera capstone project

This jupyter notebook is created for the development of the coursera capstone project

Import libraries

Open a Gym on the Neighborhoods in Toronto

Part 1: Prepare the data, use of Foursquare

First, we have to import the libraries that we will use for web scrapping, such as Beatiful Soup, and the Folium library for map representations

We have to obtain the data of the postal codes, we will use the data from Wikipedia

We have 103 rows in our dataset, with 3 columns. Lets check if we have some missing values

Now data is cleaned, we dont have any missing values or any Not assigned rows. The next step is to add the Latitude and Longitudes of each location. This is a necessary step before using the Foursquare location data, because Foursquare will use that information

Retrieve geographical coordinates using Geocoder

Now, we are gonna make a copy of the dataframe

Now we are gonna use Foursquare to look for venues

Now, let's get the top 100 venues that are in Lawrence Park within a radius of 300 meters.

Unique venues in all the neighborhoods?

Part 2: Data analysis

As the column "Venue Category" contain categorical value.So we need to convert it to numerical values using the dummies function

Clustering

We will use k-means clustering. But first we will find the best K value using the Elbow Point method.

Here we can see the the most optimal value for k is 3, so we will have our data grouped in 3 clusters.

We see that there are a total of 72 locations with Gyms in Toronto We will create a new dataframe with the Neighborhood and gyms

Visualize Clustering on Google Map

6.3. Analysis of each Cluster

Number of neighborhoods per cluster vs Average number of Gym in each Cluster